12 research outputs found

    Computational Workflow for the FineGrained Analysis of Metagenomic Samples

    Get PDF
    El desarrollo de nuevas tecnologías de adquisición de datos ha propiciado una enorme disponibilidad de información en casi todos los campos existentes de la investigación científica, permitiendo a la vez una especialización que resulta en desarrollos software particulares. Con motivo de facilitar al usuario final la obtención de resultados a partir de sus datos, un nuevo paradigma de computación ha surgido con fuerza: los flujos de trabajo automáticos para procesar la información, que han conseguido imponerse gracias al soporte que proporcionan para ensamblar un sistema de procesamiento completo y robusto. La bioinformática es un claro ejemplo donde muchas instituciones ofrecen servicios específicos de procesamiento que, en general, necesitan combinarse para obtener un resultado global. Los ‘gestores de flujos de trabajo’ como Galaxy [1], Swift [2] o Taverna [3] se utilizan para el análisis de datos (entre otros) obtenidos por las nuevas tecnologías de secuenciación del ADN, como Next Generation Sequencing [4], las cuales producen ingentes cantidades de datos en el campos de la genómica, y en particular, metagenómica. La metagenómica estudia las especies presentes en una muestra no cultivada, directamente recolectada del entorno, y los estudios de interés tratan de observar variaciones en la composición de las muestras con objeto de identificar diferencias significativas que correlacionen con características (fenotipo)de los individuos a los que pertenecen las muestras; lo que incluye el análisis funcional de las especies presentes en un metagenoma para comprender las consecuencias derivadas de éstas. Analizar genomas completos ya resulta una tarea importante computacionalmente, por lo que analizar metagenomas en los que no solo está presente el genoma de una especie sino de las varias que conviven en la muestra, resulta una tarea hercúlea. Por ello, el análisis metagenómico requiere algoritmos eficientes capaces de procesar estos datos de forma efectiva y eficiente, en tiempo razonable. Algunas de las dificultades que deben salvarse son (1) el proceso de comparación de muestras contra bases de datos patrón, (2) la asignación (m apping ) de lecturas (r eads ) a genomas mediante estimadores de parecido, (3) los datos procesados suelen ser pesados y necesitan formas de acceso funcionales, (4) la particularidad de cada muestra requiere programas específicos y nuevos para su análisis; (5) la representación visual de resultados ndimensionales para la comprensión y (6) los procesos de verificación de calidad y certidumbre de cada etapa. Para ello presentamos un flujo de trabajo completo pero adaptable, dividido en módulos acoplables y reutilizables mediante estructuras de datos definidas, lo que además permite fácil extensión y customización para satisfacer la demanda de nuevos experimentos

    Pairwise and incremental multi-stage alignment of metagenomes: A new proposal

    Get PDF
    Traditional comparisons between metagenomes are often performed using reference databases as intermediary templates from which to obtain distance metrics. However, in order to fully exploit the potential of the information contained within metagenomes, it becomes of interest to remove any intermediate agent that is prone to introduce errors or biased results. In this work, we perform an analysis over the state of the art methods and deduce that it is necessary to employ fine-grained methods in order to assess similarity between metagenomes. In addition, we propose our developed method for accurate and fast matching of reads.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Analyzing the differences between reads and contigs when performing a taxonomic assignment comparison in metagenomics

    Get PDF
    Metagenomics is an inherently complex field in which one of the primary goals is to determine the compositional organisms present in an environmental sample. Thereby, diverse tools have been developed that are based on the similarity search results obtained from comparing a set of sequences against a database. However, to achieve this goal there still are affairs to solve such as dealing with genomic variants and detecting repeated sequences that could belong to different species in a mixture of uneven and unknown representation of organisms in a sample. Hence, the question of whether analyzing a sample with reads provides further understanding of the metagenome than with contigs arises. The assembly yields larger genomic fragments but bears the risk of producing chimeric contigs. On the other hand, reads are shorter and therefore their statistical significance is harder to asses, but there is a larger number of them. Consequently, we have developed a workflow to assess and compare the quality of each of these alternatives. Synthetic read datasets beloging to previously identified organisms are generated in order to validate the results. Afterwards, we assemble these into a set of contigs and perform a taxonomic analysis on both datasets. The tools we have developed demonstrate that analyzing with reads provide a more trustworthy representation of the species in a sample than contigs especially in cases that present a high genomic variability.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Workflows and service discovery: a mobile device approach

    Get PDF
    Bioinformatics has moved from command-line standalone programs to web-service based environments. Such trend has resulted in an enormous amount of online resources which can be hard to find and identify, let alone execute and exploit. Furthermore, these resources are aimed -in general- to solve specific tasks. Usually, this tasks need to be combined in order to achieve the desired results. In this line, finding the appropriate set of tools to build up a workflow to solve a problem with the services available in a repository is itself a complex exercise. Issues such as services discovering, composition and representation appear. On the technological side, mobile devices have experienced an incredible growth in the number of users and technical capabilities. Starting from this reality, in the present paper, we propose a solution for service discovering and workflow generation while distinct approaches of representing workflows in a mobile environment are reviewed and discussed. As a proof of concept, a specific use case has been developed: we have embedded an expanded version of our Magallanes search engine into mORCA, our mobile client for bioinformatics. Such composition delivers a powerful and ubiquitous solution that provides the user with a handy tool for not only generate and represent workflows, but also services, data types, operations and service types discoveryUniversidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Microbiome overview in swine lungs

    Get PDF
    Mycoplasma hyopneumoniae is the etiologic agent of swine enzootic pneumonia. However other mycoplasma species and secondary bacteria are found as inhabitants of the swine respiratory tract, which can be also related to disease. In the present study we have performed a total DNA metagenomic analysis from the lungs of pigs kept in a field condition, with suggestive signals of enzootic pneumonia and without any infection signals to evaluate the bacteria variability of the lungs microbiota. Libraries from metagenomic DNA were prepared and sequenced using total DNA shotgun metagenomic pyrosequencing. The metagenomic distribution showed a great abundance of bacteria. The most common microbial families identified from pneumonic swine’s lungs were Mycoplasmataceae, Flavobacteriaceae and Pasteurellaceae, whereas in the carrier swine’s lungs the most common families were Mycoplasmataceae, Bradyrhizobiaceae and Flavobacteriaceae. Analysis of community composition in both samples confirmed the high prevalence of M. hyopneumoniae. Moreover, the carrier lungs had more diverse family population, which should be related to the lungs normal flora. In summary, we provide a wide view of the bacterial population from lungs with signals of enzootic pneumonia and lungs without signals of enzootic pneumonia in a field situation. These bacteria patterns provide information that may be important for the establishment of disease control measures and to give insights for further studies

    Ultra-fast genome comparison for large-scale genomic experiments.

    No full text
    In the last decade, a technological shift in the bioinformatics field has occurred: larger genomes can now be sequenced quickly and cost effectively, resulting in the computational need to efficiently compare large and abundant sequences. Furthermore, detecting conserved similarities across large collections of genomes remains a problem. The size of chromosomes, along with the substantial amount of noise and number of repeats found in DNA sequences (particularly in mammals and plants), leads to a scenario where executing and waiting for complete outputs is both time and resource consuming. Filtering steps, manual examination and annotation, very long execution times and a high demand for computational resources represent a few of the many difficulties faced in large genome comparisons. In this work, we provide a method designed for comparisons of considerable amounts of very long sequences that employs a heuristic algorithm capable of separating noise and repeats from conserved fragments in pairwise genomic comparisons. We provide software implementation that computes in linear time using one core as a minimum and a small, constant memory footprint. The method produces both a previsualization of the comparison and a collection of indices to drastically reduce computational complexity when performing exhaustive comparisons. Last, the method scores the comparison to automate classification of sequences and produces a list of detected synteny blocks to enable new evolutionary studies

    Training bioinformaticians in High Performance Computing

    No full text
    In the last decade, bioinformatics has become an indispensable branch of modern science research, experiencing an explosion in financial support, developed applications and data collection. The growth of the datasets that are emerging from research laboratories, industry, the health sector, etc., are increasingly raising the levels of demand in computing power and storage. Processing biological data, in the large scales of these datasets, often requires the use of High Performance Computing (HPC) resources, especially when dealing with certain types of omics data, such as genomic and metagenomic data. Such computational resources not only require substantial investments, but they also involve high maintenance costs. More importantly, in order to keep good returns from the investments, specific training needs to be put in place to ensure that wasting is minimized. Furthermore, given that bioinformatics is a highly interdisciplinary field where several other domains intersect (such as biology, chemistry, physics and computer science), researchers from these areas also require bioinformatics-specific training in HPC, in order to fully take advantage of supercomputing centers. In this document, we describe our experience in training researchers from several different disciplines in HPC, as applied to bioinformatics under the framework of the leading European bioinformatics platform ELIXIR, and analyze both the content and outcomes of the course

    Microbiome overview in swine lungs

    Get PDF
    <div><p><i>Mycoplasma hyopneumoniae</i> is the etiologic agent of swine enzootic pneumonia. However other mycoplasma species and secondary bacteria are found as inhabitants of the swine respiratory tract, which can be also related to disease. In the present study we have performed a total DNA metagenomic analysis from the lungs of pigs kept in a field condition, with suggestive signals of enzootic pneumonia and without any infection signals to evaluate the bacteria variability of the lungs microbiota. Libraries from metagenomic DNA were prepared and sequenced using total DNA shotgun metagenomic pyrosequencing. The metagenomic distribution showed a great abundance of bacteria. The most common microbial families identified from pneumonic swine’s lungs were <i>Mycoplasmataceae</i>, <i>Flavobacteriaceae</i> and <i>Pasteurellaceae</i>, whereas in the carrier swine’s lungs the most common families were <i>Mycoplasmataceae</i>, <i>Bradyrhizobiaceae</i> and <i>Flavobacteriaceae</i>. Analysis of community composition in both samples confirmed the high prevalence of <i>M</i>. <i>hyopneumoniae</i>. Moreover, the carrier lungs had more diverse family population, which should be related to the lungs normal flora. In summary, we provide a wide view of the bacterial population from lungs with signals of enzootic pneumonia and lungs without signals of enzootic pneumonia in a field situation. These bacteria patterns provide information that may be important for the establishment of disease control measures and to give insights for further studies.</p></div

    Mapped reads to the most abundant species identified.

    No full text
    <p>Normalized data points are separated into two y-axes, mainly one for the <i>M</i>. <i>hyopneumoniae</i> species and the other one for the rest of the species. In the x-axis, the most abundant species are shown. The y-axis shows the relative frequency in two scales; left: in green (M01) and purple (M02) the relative frequency scales for <i>M</i>. <i>hyopneumoniae</i>; right in red (M01) and blue (M02), the relative frequency scale for the remaining species.</p

    Workflow of samples preparation for metagenomic sequencing.

    No full text
    <p>The figure shows the samples collection; bacteria cell selection; bacteria DNA extraction and shotgun metagenomic sequencing. M01: tracheal and lungs lavage pool from 20 pneumonic lungs. M02: tracheal and lungs lavage pool from 20 lungs without both swine enzootic pneumonia and infection macroscopic signs.</p
    corecore